Subject clustering analysis based on ISI category classification
نویسندگان
چکیده
The study focuses on the analysis of the information flow among the ISI subject categories and aims at finding an appropriate field structure of the Web of Science using the subject clustering algorithm developed in previous studies. The elaborate clustering of more than 8,000 journals and the clustering of the ISI subject categories provide two subject classification schemes through different perspectives and levels. The two clustering results have been compared and the according accordance and divergence have been analyzed. Several indicators have been used to compare the communication characteristics among different ISI subject categories. The neighbour map of each category clearly reflects the affinities between the “core” category and its satellites around. Introduction A series of previous studies focused on the analyses of journal clustering based on a complete journal-journal cross-citation matrix (Zhang et al, 2009; Janssens et al, 2009; Zhang et al, 2009). The Institute of Scientific Information (ISI) has assigned each journal included to one or more subject categories. Based on this classification scheme, the journal-journal matrix can be aggregated to a category-category matrix, which is much more densely populated than that on the journal level. The present study will focus on the analysis of the information flow among the ISI subject categories. This will be done by two important reasons. This exercise aims at finding an appropriate field structure of the Web of Science using the subject clustering algorithm developed in previous studies. Furthermore, since ISI Subject Categories are based on journal assignment the question arises of whether what changes if journal crosscitation is replaced by subject cross-citation. If changes are not essential, the elaborate clustering of more than 8,000 journals could be substituted by a somewhat easier analysis of roughly 250 ISI categories and the journal level could, as it were, be skipped. However, we * Corresponding author stress that cross-citations are calculated from individual paper-to-paper links whatever aggregation levels are chosen. The other reason is to analyze whether multiple journal assignment to subject categories interferes with, distorts or even determines the resulting cluster structure. Before we introduce the methodological rudiments, we briefly summarise the historical background and the outcomes of previous or related studies. Along with the development of computerised scientometrics, mapping of science plays an important role in the construction, learning, and dissemination of science structure. For instance, a variety of techniques for analyzing journal-journal citation relationships have been reported in the literatures to cluster scientific journals (Doreian and Fararo, 1985; Tijssen et al., 1987; Leydesdorff, 2006). An alternative method of co-citation clustering has been investigated in constructing a World Atlas of Sciences for ISI (Garfield et al., 1975; Leydesdorff, 1987; Small, 1999). Boyack, Klavans, and Borner (2005) applied eight alternative measures of journal similarity to a dataset of 7,121 journals covering over one million documents in the combined Science Citation and Social Sciences Citation Indexes, to show a global map of science using the force-directed graph layout tool VxOrd. Chen (2008) proposes an approach to classify scientific networks in terms of aggregated journal-journal citation relations of the ISI Journal Citation Reports using the affinity propagation method. As mentioned in the outset, Zhang et al. (2009) have also investigated different methods for the analysis and classifications of scientific journals. Besides using journals as the units of analysis, some recent researches focus on the science structure based on the subject categories. Glänzel and Schubert (2003) designed a new classification scheme of science fields and subfields for scientometric evaluation purposes. Moya-Anegon et al. (2004) proposed a new technique that uses thematic classification as entities of co-citation, and presented an egocentered network of 222 ISI categories including science and social sciences. Leydesdorff and Rafols (2009) classified the ISI 172 science categories into 14 groups based on factor analysis, and compared the interdisciplinarity of each category using betweenness centrality. Compared to other researchers, we applied a new clustering technique to classify the ISI science and social sciences categories into 7 groups based on the category-category cross-citation similarities, and further compared the results with the 7 hybrid clustering solution of 8305 journals in a previous study (Zhang et al, 2009). Furthermore, several indicators have been used to analyze the communication characteristics of different categories. Data sources and processing The data has been collected from the Web of Science of Thomson-Reuters (Philadelphia, PA, USA). Altogether 9487 journals which were assigned to the 246 categories of sciences, social sciences and arts and humanities in the entire period of 2002-2006 were selected and only four document types, namely, article, note, letter and review, were taken into consideration. More than six million papers were indexed and citations have been summed up through a variable citation window, from the publication year till 2006.
منابع مشابه
On Model-Based Clustering, Classification, and Discriminant Analysis
The use of mixture models for clustering and classification has burgeoned into an important subfield of multivariate analysis. These approaches have been around for a half-century or so, with significant activity in the area over the past decade. The primary focus of this paper is to review work in model-based clustering, classification, and discriminant analysis, with particular attenti...
متن کاملA global map of science based on the ISI subject categories
The ISI subject categories classify journals included in the Science Citation Index (SCI). The aggregated journal-journal citation matrix contained in the Journal Citation Reports can be aggregated on the basis of these categories. This leads to an asymmetrical transaction matrix (citing versus cited) which is much more densely populated than the underlying matrix at the journal level. Explorat...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملCustomer Behavior Mining Framework (CBMF) using clustering and classification techniques
The present study proposes a Customer Behavior Mining Framework on the basis of data mining techniques in a telecom company. This framework takes into account the customers’ behavior patterns and predicts the way they may act in the future. Firstly, clustering technique is used to implement portfolio analysis and previous customers are divided based on socio-demographic features using k</em...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Informetrics
دوره 4 شماره
صفحات -
تاریخ انتشار 2010